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ABSTRACT 



Rubin’s model for causal inference in experiments and observational studies 
is enlarged to analyze the problem of "causes causing causes” and is compared to 
path analysis and recursive structural equations models. A special quasi- 
experimental design, the encouragement design, is used to give concreteness to 
the discussion by focusing on the simplest problem that involves both direct and 
indirect causation. Rubin’s model is shown to extend easily to this situation 
and to specify conditions under which the parameters of path analysis and recur- 
sive structural equations models have causal interpretations. 



Key words: Rubin’s model, encouragement designs, experiments, randomization, 

observational studies, causal models, quasi-experiments, self- 
selection, self -administered treatments , simultaneous equations 
models . 
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1. INTRODUCTION 

I believe that the perspective on causal inference developed extensively 
by Rubin (1974, 1977, 1978, 1980) provides a solid basis for considering issues 
of causal inference in complex cases and that it is the only one that is grounded 
in the place where causal inferences are relatively uncontroversial — experi- 
mental science. In Holland (1986a, b), I described this perspective and dubbed it 
”Rubin*s model** as I will refer to it here, too, even though a more general 
reference, such as **the experimental model** may be more appropriate. Robins 
(1984, 1985, 1986) gives a closely related model in the context of epidemiologi- 
cal studies. One goal of this paper i. o extend Rubin*s model to accommodate 
a class of quasi- experimental procedures that are called ** encouragement 
designs,** by Powers and Swinton (1984). These designs involve both ran- 
domization and se 1 f -s elec t ion as well as both direct and indirect causation. 
Encouragement designs provide a simple yet useful ** laboratory** in which the 
issues of direct and indirect causal relationships can be carefully examined. I 
hope to clarify the relationship between the systematic structure of Rubin* s 
model and the less formal approaches of path analysis and structural equations 
modeling. My own experience has been that any discussion of causation is 
enriched by an analysis using Rubin's model — for example, Holland and Rubin 
(1987), Rosenbaum (1987) and Holland (1988). 

Before proceeding 1 want to make a few general comments about causation to 
set the stage for the subsequent discussion. 

In my view, in most discussions of causation, all too little attention is 
given to distinguishing the question of **What is the cause of a given effect?** 
from that of "What is the effect of a given cause?** Since the time of 
Aristotle, philosophers have tried to define what it means for A to be a cause 
of B. This activity still continues (Lewis, 1987 and Marini and Singer, 1988). 
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Yet, the attribution of causation has been known to be fraught with difficulty 
since, at least, Hume’s analysis in the mid 1700 *s. A statement like **A is a 
cause of B” is usually false in the sense that it is, at best, a tentative sum- 
mary or theory of our current knowledge of the cause (or causes) of B. For 
example, do bacteria cause disease? Well, yes... until we dig deeper and find 
that it is the toxins the bacteria produce that really cause the disease. Yet 
this is not quite correct either — certain chemical reactions are the real 
causes — and so on, ad infinitum . Experiments, on the other hand, do not iden- 
tify causes. Rather, an experiment results in the measurement of the effects of 
given causes (i.e. the effects of the experimental manipulations). The results 
of an experiment can be summarized by a statement of the form "an effect of A is 
B", but not by one of the form "A is a cause of B" unless we mean by the latter 
no more than the former. I would be surprised if most modern scientists would 
be willing to equate theoretical statements like "A is a cause of B" with 
empirical regularities like "the effect of A is E". Theories may come and go, 
but old, replicable experiments never die; they are just reinterpreted. 

The strength of Rubin’s model is that it builds on the success of experi- 
mentation and focuses on the measurement of the effects of causes rather than 
attempting to identify the causes of effects. Statistics has made major contri- 
^ butions to issues of causal inference when it has addressed the problem of 
measuring the effects of causes. It does less useful things, in ray opinion, 
when its methodology claims to identify the causes of effects. Rubin’s model 
focuses our attention on what can be done well rather than on what we might like 
to do, however poorly. 
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I do not mean to imply that the search for the causes of a phenomenon is a 
useless endeavor: indeed, it is a driving force that motivates much of science. 
Rather I mean that a logical analysis of the search for causes follows from an 
analysis of the measurement of causal effects and it is not logically prior to 
this more basic activity. Defensible inferences about the causes of an effect 
are always made against a background of measured causal effects and relevant 
theories . 

I have tried to follow several goals in writing this paper. First, I 
discuss population quantities rather than sample estimates of population quan- 
tities. Thus, it is best to think of the populations that occur here a? large 
or infinite. I do not apologize for this, since such a view is implicit in most 
discussions of path analysis. My aim is to define causal parameters, rather 
than to discuss ways of estimating them. Second, I may, on occasion, appear 
overly notational, and I apologize for that. My defense is that I wish to be 
very clear about what I mean, and since causation is a subtle idea, an adequate 
notation is essential to understanding it. Unfortunately, my notation is not 
identical to that usually used in path analysis or structural equations models, 
but it is only intended to be more explicit than these other schemes are. 
Finally, my goal is to put the, to me, complex and intuitive models used in path 
analysis into a framework that I find helpful in complex problems, and I hope 
others find it useful, too. 

Summary of Rest of Paper 

In section 2, I define and give an example of an "encouragement design" 
that is used in the rest of this paper to focus the discussion. I think these 
designs are interesting in their own right because they attempt to measure the 
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effects of self-selected treatments. Section 3 reviews, from my point of view, 
three related topics that concern path analysis — deterministic linear system's, 
path analysis and recursive structural equations models. In section 4, I extend 
Rubin's model to the case of encouragement designs to allow for both direct and 
indirect causation. A short discussion ends the body of the paper. I also 
include an appendix on Rubin's model applied to experiments and observational 
studies in order to make the paper reasonably self-contained. 
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2. ENCOURAGEMENT DESIGNS 

While it is common to discuss path analysis and causal models in terms of 
abstract systems of variables, I find it easier to discuss issues of causal 
inference in the context of specific examples or classes of examples. For this 
reason I will use a fairly concrete quasi--experiraental design, the 
“encouragement design,” as the basis of my discussion of causal theories that 
involve direct and indirect causation. I feel that the encouragement design is 
a simple and relatively clear-cut type of study in which many of the issues of 
direct and indirect causation arise. 

I will introduce encouragement designs by giving an example that is used 
throughout the rest of this paper. Suppose we are interested in the effects of 
various amounts of study on the performance of students on a test. I will sup- 
pose that there are two experimental treatments: one that encourages a student 
to study for the test (t » treatment) and one that does not (c * control). 

After exposure to one of these treatments, a student will then study for the 
test for some amount of time, R. Subsequently, the student is tested and gets a 
test score, Y. An example of an encouragement design, similar to the one just 
described, is given in Powers and Swinton (1984). My first exposure to a formal 
analysis of encouragement designs was in Swinton (1975). 

The only experimental manipulation in an encouragement design is exposure 
to the “encouragement conditions” — which are just t or c here, but they could 
involve more than two levels, of course. Hence, using standard methods one can 
measure the effect of encouragement on the amount of study, i.e. R, as well as 
on test performance, i.e. Y. However, we may also be interested in the effect 
of studying on test performance. Thus, random assignment of encouragement con- 
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ditions might be possible, but the students will then self-select their own 
exposure levels to the amount they study, R. This self-selection is a critical 
feature of encouragement designs and it is why I have chosen to refer to them as 
a type of "quasi-experimental** design, after Campbell and Stanley (1966). The 
other critical feature of encouragement designs is the analyst’s interest in 
measuring the causal effect of the amount of study, R, on test performance. I 
have chosen this example specifically because from an individual student's point 
of view, the amount one studies is a self-imposed treatment that can be measured 
and over which one can exercise control. However, from the analyst's point of 
view, the amount a student studies is a response to the encouragement condition, 
as is the student's test performance. In this very special type of situation, 
"amount of study" plays both the role of a response and the role of a self- 
imposed treatment; i.e., it is both an effect and a cause. 

Encouragement designs can arise in any study of human subjects in which the 
treatments or causes of interest must be voluntarily applied by the subjects to 
themselves. Other potential examples are medical studies that encourage 
voluntary healthful activities among patients or economic studies that attempt 
to alter people's spending behavior by various inducements. The analysis of 
surgical trials may involve randomization of the "intention to treat" to 
patients, but because of clinical intervention, the actual treatment patients 
get may not be the one to which they were randomly assigned. This is similar to 
an encouragement design, but the models discussed in this paper may not be 
appropr iate to that case , since I treat "amount of study" as a cont inuous 
variable. The general ideas are the same, however. 
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I suspect that encouragement designs are quite widespread but may not 
always be recognized. On the other hand, the special nature of these designs 
cannot be overemphasized, in my opinion. While it is plausible that ‘‘amount 
of study“ is both an effect and a cause, this dual role is not always a 
plausible assumption, and ignoring this fact can lead to some rather curious 
causal statements. It is critical, in the analysis developed here, that those 
things that play the role of causes or treatments have levels that are, in prin- 
ciple, alterable. The statement “I could have studied but I didn*t“ has this 
flavor, but “I might have scored higher on the test but I didn‘t“ does not. See 
Holland (1986b Section 7, and 1988) for more emphasis on this very important 
point . 

The basic elements of an “encouragement design" are, thus, (a) an experi- 
mental manipulation of “degrees’* of encouragement (here, just t and c) to per- 
form some activity, (b) measurement of the subsequent amount of the encouraged 
activity, (c) measurement of a final outcome or response variable and (d) an 
interest in measuring the causal effect of the encouraged activity on the 
response variable. Encouragement designs are more often applied to human popu- 
lations than to other types of experimental units because of the self-selected 
or voluntary nature of much of human activity. 
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3. DETERMINISTIC LINEAR SYSTEMS, PATH ANALYSIS, AND RECURSIVE STRUCTURAL 
EQUATIONS MODELS 

There are three related topics reviewed in this section — deterministic 
linear systems, path analysis and structural equations models. All three will 
arise in my discussion of encouragement designs in section 4. I frame this 
review in terms of the structure of encouragement designs. 

Extended discussions of path analysis and structural equations models may 
be found in many places, for example, Blalock (1964, 1971), Duncan (1966, 1975), 
Freedman (1987), Goldberger (1964), Goldberger and Duncan (1973), Heise (1975), 
Kenny ( 1979 ) , Saris and Stronkhorst ( 1984) , Tukey ( 1954) , and Wright ( 1934) . I 
follow Tukey (1954) in not standardizing the variables to have zero mean and 
unit standard deviation and in emphasizing regression coefficients rather than 
standardized regression coefficients. 

3 . 1 Deterministic Linear Systems 

Suppose there are two linear functions, f and g, of two variables s and r, 
of the form 

f(s)»as+d, (1) 

and 

g(s,r) ■ b s + c r + d", (2) 

where a, b, and c are the important slope parameters and d and d^ are constants 
that play no essential role in this theory. We introduce a third variable, y, 
into this system via the definition 

y - g(s,r), (3) 

and we assume that r and s are related by the functional relationship 

r - f(s). (4) 
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Such a system captures the idea that r is functionally dependent on s and y is 
functionally dependent on s and r. Since f and g are linear, changes in y and 
r are determined by the slope parameters, a, b, and c. This system may be 
represented by the "path" diagram in Figure 1. 



r 




The coefficients a, b, and c are the "path" coefficients, or the "direct 
effects"; i.e., a is the direct effect of s on r, c is the direct effect of r on 
y, and b is the direct effect of s on y. 

The "total effect" of s on y is found by substituting the equation for r 
into that for y. This yields 

y “ g(s, f(s)) ■ bs + c(a s + d) + d^, 

« (b + ca) s + (cd + d^), 

so that 

y«(b + ac)s + d^\ ( 5 ) 

Hence, the total effect of s on y is b + ac, which may also be calculated as the 
sum of the products of all the direct effects along all the paths connecting s 
and y in the path diagram in Figure 1; i.e., s to y yields b, and s to r to y 
yields ac, so the sum is b + ac. 
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I use the phrase "deterministic linear system" to refer to a path 
diagram that arises from a set of nonstochastic linear equations like the ones 
just described. 

Viewed simply as a visual representation of a deterministic linear system, 
path diagrams are easy to understand and can help keep track of the bookkeeping 
that is associated with the total effects of one variable on another. The real 
appeal of path diagrams arises in systems that involve more than three 
variables, but the basic ideas are already present in systems with three. 

The path diagram in Figure 1 is not the only one we could draw using three 
variables, but it is relevant to encouragement design in the following way. If 
s » 1 or 0 as there is encouragement or not and if r denotes the resulting 
amount of study and y denotes the subsequent test score, then the parameters 
a, b, and c have the following interpretation. The change in amount of study 
due to encouragement to study is a, and b + ac is the change in test scores 
due to encouragement to study. The change in test scores due to a unit change 
in the .amount of study within each level of the encouragement condition, s, is c. 

When one of the coefficients a, b, or c is zero, it is customary to delete 
the corresponding arrow from the path diagram. For example, if b-0, we have the 
diagram in Figure 2. 
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Figure 2 

encouragement design example, the path diagram in Figure 2 would be 
as displaying no effect of encouragement on test scores except 
effect on studying. We will return to this idea later in section 



4. 



Deterministic linear systems not only motivate the nondeterminist ic linear 
models of path analysis and structural equations models but also play a role in 
what I call the ALICE "causal model" in section 4.2. 



3 . 2 Path Analysis 

Deterministic linear systems do not really describe data, except in certain 
special c irctimstances , usually in the physical sciences. Suppose instead that 
there is a population U of "units" and that for each unit u in U we can obtain 
measurements on three numerical variables, S(u) , R(u), and Y(u). In our applica- 
tion, the units are students; S(u) ■ 1 if u is encouraged to study, and S(u) » 0 
if otherwise; R(u) is the amount that u studies; and Y(u) is u*s test score. 

As u varies over U, (S(u), R(u) , Y(u)) forms a trivariate distribution. 

This distribution can be used to define quantities suc^^ as the conditional 
expectation of R given S, E(r|s*s). This conditional expectation is the average 
value of R for those units in U for which S(u)«s. The conditional expectation, 
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E(Y|R«r, S«s), has a similar definition in terras of averages over U. The 
expected value E(Y|R*r, S«s) is the "true** regression function of Y on R and S 
in the sense that it is what one is trying to estimate by a least squares 
regression fit of Y regressed on R and S. However, in general, E(YlR«r, S«s) 
need not be linear in r and s. 

In the example of an encouragement design, there is a natural **causal order'* 
to the variables S, R, and Y. S comes first, then R, and then Y. A path analysis 
uses a causal ordering to focus on certain regression functions; in the encoura- 
gement design, they are the two described above: E(r|s»s) and E(y1s=«s, R*r). 
Suppose, for simplicity, that they are both linear, i.e., that 

E(rIs-s) - f(s) - as + d (6) 

and 

E(y|s«s, R»r) = g(s,r) - br + cs + d'. (7) 

This defines a deterministic linear system, as described above, when we identify 
y with g(s,r) and equate r and f(s). We may associate the path diagram in 
Figure 1 with this system, but because we are dealing with the measurements S, 

R, and Y rather than the abstract variable s, r, and y, we relabel the nodes of 
the graph with S, R, and Y, as in Figure 3. 



R 
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The path coefficients in Figure 3 are just the (population) linear regression 

coefficients that may be estimated by a (linear) regression of R on S, and of Y 

on S and R. The same terminology is used as before for the direct effects — 
the regression coefficients are the direct effects. The “total effect** of S on 
Y, i.e. b + ac, has the nice interpretation of being the coefficient of S in the 
regression of Y on S alone, i.e. 

e(y|s) - e(e(y|s,r)1s) 

- E(b S + c R + d' IS) 

- b S + c E(r1s) + d' 

« b S + c(a S + d) + d' 

- (b + a c) S + dc + d' \ (8) 

I will use the phrase **empirical path diagram** to refer to any path diagram 
constructed from a causal ordering and the implied set of linear regression 
functions. An empirical path diagram is, therefore, simply the result of com- 
puting certain regression coefficients and arranging them in the appropriate 
places in the diagram. 

In path analysis, the causal order is given, often rather vaguely, by some 
sort of theory. One nice feature of encouragement designs is that the causal 
order of **S before R before Y** is consistent with the way the data might be 
collected and with our intuition about study and test performance. Once given, 
the causal order tells us which (linear) regression functions to estimate from 
the data. In the estimated regression functions, the coefficient of each inde- 
pendent variable is interpreted as the **effect** of that independent variable on 
the dependent variable. Thus, in (7), b is the “effect** of studying on test 
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performance. This usage is typical of the casual causal talk (lamented by 
Rogosa, 1987) that often accompanies regression analyses. In section 4, I will 
show how causal effects can be precisely defined within Rubin’s model and how 
specific assumptions are needed to conclude that regression coefficients are 
equal to causal effects. 

In my view, a causal ordering is not sufficient to justify interpreting a 
regression coefficient as a causal effect. A causal ordering only tells the 
analyst which regression functions to estimate. 

з . 3 Structural Equations Models 

In some areas of applied statistics — notably econometrics but also parts 
of sociology, psychometrics and even political science — it has become the 
standard practice to use a framework that is, in a sense, more general than the 
conditional expectations and regression functions of path analysis, just 
described. These are called both structural equations models and simultaneous 
equations models. Instead of formulating a causal ordering or causal model for 
the encouragement design in terms of the regression functions, E(rIs-s) and 
E(Yjs»s, R»r) , a structural equations model for such a design would be expressed 
as the following system of 2 equations, 

R(u) « d + a S(u) + €^(u), (9) 

and 

Y(u) - d' + b R(u) + c S(u) + 

In (9) and (10), S, R, and Y are as before but €^(u) and € 2 (u) are new variables 
defined for all u*s in U, so that equations (9) and (10) hold exactly for each 

и. The £i and 62 called ’’error” or ’’disturbance” terms and they take up the 

slack in the empirical relationship between R and S and between Y and R and S. 
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The system (9) and (10) is "recursive”, in the language of structural equations, 
because Y(u) does not occur on the right-hand side of equation (9) — Goldberger 
(1964). The disturbance terras differ from the variables Y, R and S in that they 
are unobservable. The three-variable system (S(u), R(u), Y(u)) is thus 
enlarged to a five-variable system (S(u), R(u), Y(u), €j^(u), € 2 (u)), which de- 
fines a five-dimensional, multivariate distribution as u varies over U. This is 
what is meant by saying that S, R, Y, and €2 are "random variables”. The 
causal interpretation of structural equations models, such as (9) and (10), is 
based on the following extension of the notion of "effect” in regression 
discussed earlier. For example, in equation (9) a is the "effect" of S on R 
while El is that part of R that is determined by all other relevant causes that 
are not measured (see Goldberger (1964) for an explicit statement along these 
lines). Thus, the equation R * d + aS + is a tidy totaling of the effects 

of all causes, both measured (i.e. S) and unmeasured (i.e. €^), on R. The point 
of view that underlies such an interpretation of equation (9) is that the value 
of R(u) is "caused" in some sense by numerous factors including S(u). I find 
this sense of causation quite unclear because it makes rather vague references 
to the "causes" of the value of a variable, i.e. of R(u) , rather than to 
measuring the effect of the experimental manipulation described by S(u). In 
section 4.4, I show how Rubin *s model can be used to give causal interpretation 
of the parameters of models like (9) and (10) in some situations. 

It is easy to show that without further assumptions on the joint distribu- 
tion of the disturbance terms, and € 2 , with R and S, equations, (9) and 
(10), cannot necessarily be interpreted as conditional expectations. This is 
often discussed in econometrics as the condition under which ordinary least 
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squares estimates give unbiased estimates of structural parameters, e.g., 
Goldberger (1964). For example, if we assume equation (9) and compute E(r|s»s), 
we get 

E(r|s-s) » E(d + aS + ejs-s) 

• d + as + E(€]^| S»s ) . 

Thus, in order for E(r1s»s) » as + d, we need the joint distribution of and S 
over U to satisfy 

E(€i1s-s) - 0, for s-0,1. (11) 

A sufficient condition for this is the independence of and S and the usual 
zero-expected-value-condition, E(€]^) * 0, for Similarly, in order for 

E(y1s»s, R»r) - d" + br + cs (12) 

we need the following condition satisfied: 

E(€i|s«s, R«r) « 0 for all s and r. 

Structural equations models like (9) and (10) may be regarded as more 
general than the regression functions (6) and (7) precisely because we may 
impose assumptions on the distribution of the disturbance terms, and € 2 * that 

do not necessarily result in a correspondence between the equations (9) and (10) 
and the regression functions (6) and (7). Unfortunately, since 6^ and €2 are 
unobservable, it is not always evident how to verify assumptions made about 
them. For example, why should be independent of S over U when by definition 
6]^ « R - aS - d, i.e., when the very definition of 6^ involves S? Such assump- 
tions must be justified by considerations that go beyond the empirical data. 

In my opinion, structural equations models do little more to justify the 
causal interpretation of their coefficients than do the causal orderings of path 
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analysis. In both approaches, such causal interpretations are established by 
fiat rather than by deduction from more basic assumptions. Rubin’s model, as I 
will show in the next section, allows one to formally state assumptions about 
unit-level causal effects that imply causal interpretations of regression coef- 
ficients and structural parameters, if these assumptions are met. 
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4. A CAUSAL MODEL FOR ENCOURAGEMENT DESIGNS 

The appendix gives an overview of Rubin *s model as applied to randomized 
experiments and observational studies. In this section, I will extend that 
model to accommodate the added complexity of encouragement designs, with two 
levels of encouragement, t and c. I have tried to write this extension of 
Rubin’s model so that the reader does not need to refer to the appendix in order 
to understand it, except for amplification of a few points. 

4. 1 The General Model 

The key property of encouragement designs is that there is one cause — 
i.e. encouragement (indicated, as before, by S(u) » 1 or 0 as u is either 
exposed to t or c) — that affects another cause — i.e. amount of study 
(indicated by R) — and that these two causes, in turn, can affect the response 
of interest — i.e. test performance (indicated by Y). However, the mathemati- 
cal structure of R and Y is really quite different from that used in section 3 
— where R and Y were both simply regarded as functions of u alone, R(u) and 
Y(u). 



To begin, the amount that u studies depends, potentially, on u and on which 
encouragement condition to which u is exposed, so that R is really a function of 
u and s, where s « t or c, i.e. R(u,s). Thus we have, 

R(u,t) « amount u studies if encouraged to study 



(13) 

R(u,c) « amount u studies if not encouraged to study. 

Let K « {t,c}: then K is the set of encouragement conditions and R is a real 
valued function on U x K. 

What about Y? The test performance of u depends, potentially, on u, on 
whether u is encouraged to study or not (s), and on the amount of time u stu- 
dies (r). Hence, Y is a function of u, s, and r (Y(u,s,r)). Thus, we have 
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Y(u,t,r) » test score for u if u is encouraged to study 
and u studies for r hours, 



(U) 



Y(u,c,r) » test score for u if u is not encouraged to study 
and u studies for r hours. 

The variable S(u) depends only on u, as it did in section 3, since S(u) 
indicates whether u is exposed to t or to c. I will engage in a slight abuse of 
notation and use S(u) » t or c to index the encouragement condition to which u 
is exposed and S(u) * 1 or 0 to indicate the same thing when I need S(u) to be a 
treatment indicator or **dummy** variable in a regression function, as in section 



3. 



In summary, the model for an encouragement design is a quintuple 
(U,K,S,R,Y), where U and K are sets, S maps U to K, R is a real-valued function 
of (u,s), and Y is a real-valued function of (u,s,r). 

A subscript notation is useful, and we let 

Rg(u) - R(u, s) , ( 15 ) 

and 



Ygr(u) - Y(u,s,r). (16) 

Some people find such an explicit notation — i.e. F;(u,s) and Y(u,s,r) — 
loathsome, but I do not see how one can precisely define the elusive concepts 
that underlie causal inference without them. R(u,s) and Y(u,s,r) are not 
directly observable for all combinations of u, s, and r. This is the main 
reason why causal inference is difficult and involves something more than merely 
the study of associations. In section 3, I used R(u) and Y(u) to denote the 
values of R and Y that are observed for unit u. This standard notation is 
actually misleading because it does not reveal the causal structure of the 
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problem. In terms of S(u), R(u,s) and Y(u, s, r), the observed values of R and 
Y are properly defined as follows: 

■ R(u, S(u)) ■ the observed R-response, (17) 

and 

Ysr«(u) * Y(u, S(u), R(u, S(u))) * the observed Y-response. (18) 
o 

The use of "multiple versions" of the dependent variable — e.g. R^, and R^, 
Y^r Yq^ — goes back to Neyman (1935) in the experimental design literature 
and is often implicit in the early work of Fisher (1926). See Holland (1986b, 
section 6 ) for more on the history of this notation. 

The dependence of R(u,s) and Y(u,s,r) on the unit, u, is the way that 
Rubin *s model accommodates individual variation in response to causes. This 
individual variation is just another way of conceptualizing the idea that the 
value of a response, say Y, depends both on causes that are measured, like s and 
r, and on other factors that affect u*s responses in various ways. 

The data obtained from any unit u in an encouragement design is the triple 

(S(u), Rs(u), Ysr 3 (u)). (19) 

In an encouragement design, the values of S(u) are under experimental 
control, so that the value of S(u) for each u can be determined by randomization. 
When U is infinite, randomization implies that S(u) is statistically independent 
of Rg(u) and Ysf(u) over U for any choices of s and r. When U is finite and 
large, randomization implies that the independence of S and Rg and of S and Yg^. 
over U holds approximately. This is discussed in more detail in the appendix. 

An important difference between the variables Rg and R 3 and between Yg^ and 
^SR 5 is that, except in \*ery special circumstances, randomization does not imply 
that the observed variables Rs or Ysr^ are statistically independent of S over U 
even though Rg and Yg^ are. For example. 
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P(RS - r|s - t) - P(Rt - r|s - t) - P(Rt - r), 
and unless P(Rt - r) - P(Rc - r), it follows that P(Rs - rls - t) # P(Rs « r). 
(The "probabilities" [P(Rs “ r|s - t), P(Rt - r) etc.] are to be interpreted 
simply as proportions of unit in U, (see the appendix). Thus, randomization may 
be used to justify the assumption that S and {Rs> ^sr> inde- 

pendent but not that S and the observed values , R 3 and YsRg> are independent. 

There are four types of unit-level causal effects in this system: three 
different effects of encouragement (t) and one effect of studying (R). Thus, t 
can affect both R and Y, and two of the t-effects are defined as follows: 

Rt(u) - Rc(^) ■ the causal effect of t on R, (20) 

YtR^(u)(^) ~ YcJ^^(^)(u) ■ the causal effect of t on Y. (21) 

The definition in (20) is interpreted as the increment in the amount that unit u 
would study if encouraged to study over how much u would study if not 
encouraged. The definition in (21) is similar in that it is the increment in 
the test score that u would obtain if u were encouraged to study (and studied 
for Rt(^) hours) over the test score that u would obtain if u were not 
encouraged to study (and studied for R^(u) hours). 

In addition to (20) and (21), in order to specify the ALICE model in the 
next section, we need to define the effect of t on Y for fixed r, i.e., the 
effect of t on Y(*, •, r). This is 

Ytr(^) “ Y(,j.(u) ■ the causal effect of t on Y(*, •, r). (22) 

Definition (22) is the **pure” effect of encouragement on test scores because it 
is the increment in u*s test score when u studies r hours and is encouraged to 
study, compared to u*s test score when u studies r hours but is not encouraged 
to study. Definition (22) is an explicit statement of the idea that the amount 
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u studies is a self-selected treatment that can differ from what actually 
occurrs i.e., from the particular values R^Cu) and Rq(u) that appear in defini- 
tion (21). The idea behind (22) is, in my opinion, quite subtle and central to 
the notion of indirect causation. In the studying example used throughout this 
paper, it may be plausible to suppose that causal effects defined in (22) are 
all zero, but I shall not make that assumption at this stage of the development 
in order to allow the model to apply to other cases in which these causal 
effects might not be zero. 

The amount of study, R, can affect only Y, and the effect of R is defined as 
follows : 

effect of R»r relative to R«r' on Y(*, s, •)• (23) 
Definition (23) is also an explicit statement of the idea that amount of study is 
a self-selected treatment and can differ from the amount the student did study; 
i.e., r could have taken on values other than Rt(u) and R^,(u). In (23), the 
encouragement condition is fixed, s, and the causal effect of R is the change in 
test score that results when u studies r versus r' hours. 

These four types of causal effects, i.e. 

Rt(u) - Rc(u), YtR^(u)(u) - Ycr^,(u)(u), Ytr(u) - Ycr(u) and Yg^Cu) - Ygr'(u), 
are all defined on each unit and express the effect of encouragement and of 
studying on the behavior of individual students. 

The key feature of Rubin's model is its use of unit-level causal effects as 
the basic building blocks for defining all other causal parameters. (Rogosa 
(1987) also emphasizes the importance of models that start at the level of indi- 
vidual Units and build up.) Unit-level causal effects are never directly obser- 
vable because of what I call the Fundamental Problem of Causal Inference (see 
the appendix), but they may be used to define causal parameters that can be 



23 



estimated or measured with data. 

Averaging each of the four types of unit-level causal effects over U 
results in the important causal parameters called average causal effects , or 
ACEs. The four ACEs are 

ACEtc(R) - E(Rt - Rc), (24) 

ACEtc(Y) - E(YtR^ - YcR^), (25) 

ACEtc(Y(-, •, r)) - E(Ytr " Y^r), (26) 

and 



ACE 



rr 



'(Y(*, s, •)) ■ ^( Ysr ^ ^sr ‘ 



(27) 



In (24) - (27) and below, we use E( ) to denote expectation or average over U. 

The ACEs are typically the only causal parameters that can be estimated with 
data. Under some conditions, such as those defined by the ALICE model discussed 
in section 4.2, an ACE may be interpreted as a unit-level causal effect, but in 
general it is not. 

The ACEs must be distinguished from the prima facie average causal effects , 
or FACES, which are defined in terms of the observables S(u) , R 3 (u), and (u) . 

The four FACEs are the following differences in regression functions, 

FACEtc(R) - E(Rsls - t) - E(Rs|s - c), (28) 

FACEtc(Y) - E(Ysr 3 !s - t) - E(YsR 3 ls - c), (29) 

FACEtc(Y(*. •, r)) - E(YsRgls - t, Rs - r) - E(YsR 3 ls - c , Rg - r), (30) 

FACErr'(Y(*, s, •)) - E(YsR 3 ls - s, Rg - r) - E(Ysr 31 s - s , Rg - r'). (31) 

Because the are based on the observables, the FACEs are assoc iational para- 
meters rather than causal parameters. They are prima facie ACEs rather than 
ACEs because they may or may not equal their corresponding ACEs, depending on 
^ lether certain assumptions are met. Causal inference in Rubin’s model means 
inference about causal parameters, such as the ACEs. Such inferences must be 
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made from observable data and hence the FACES play an important role. For 
example, consider FACEtc(^)* Since S is independent of and by assumption 
(a consequence of the random assignment of the encouragement conditions), we 
have 

FACEtc(R) - E(RsIs - t) - E(Rsls - c) 

- E(Rtls - t) - E(Rcls - c) 

- E(Rt) - E(Rc) 

- ACEtc(R). (32) 

Thus, because of random assignment, the causal parameter, ACE^^CR) and the 
associational parameters, FACEtc(R)> are equal. Similarly, one may show that 

FACEtc(Y) « ACEtc(Y), (33) 

also because of the random assignment of encouragement. 

The other two FACES involve R 3 , whose distribution is not under experimental 
control. First consider FACE^^^CYC*, •, r)) 

- E(YsRgls - t, Rg - r) - E(YsRgls - c, Rg - r) 

- E(Ytr|s . t, Rt - r) - E(Ycr|s - c , R^ - r) 

- E(Ytr|Rt • r) - E(Ycr|Rc - r). (34) 

In general, this does not equal- E(Ytr) - E(Y(,r). Therefore, we cannot use 
FACEtc(Y(*, •, r)) for ACE^qCYC*, •, r)) without additional assumptions. 

Next consider FACE^^^(Y(*, s, •)) 

- E(YgRg|s - s, Rg - r) - E(YgRgls - s, Rg - r') 

- E(Ygr|s - s, Rg - r) - E(Ygr'ls - s, Rg - r') 

- E(Ygr|Rs - r) - E(Ygr'|Rg - r'). ^ (35) 

Again, in general this does not equal the corresponding ACE, i.e. E(Yg^) - 

which is the average causal effect of studying on test performance that 
interests us in an encouragement design. 
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What can we conclude so far? First, assuming random assignment of the 
encouragement conditions to units, the FACEs based on the conditional expec- 
tations of Rg and of YgR^ given S are equal to their corresponding ACEs and thus 
have causal interpretations as ACEs. These FACEs would be estimated, in prac- 
tice, by treatment-control mean differences for Rg and YgR^, respectively. This 
result is not surprising, and related material is discussed in the appendix. 
Second, the other two FACEs, those based on the conditional expectation of 
YgR given both S and Rg, do not equal their corresponding ACEs, in general, and 
in particular, without further assumptions it is not true that the "effect of 
studying" on test performance that one would obtain from a regression analysis 
of YgR on S and Rg can be interpreted as an average causal effect over U. 

4.2 The ALICE model 

In Rubin's model, a causal theory specifies, or partially specifies, values 
for R(u,s) and Y(u,s,r). An important causal theory that I find helpful in 
understanding the relationship between this extension of Rubin's model and path 
analysis and structural equations models is what I call the "additive , 1 inear , 
constant-effect" or ALICE model. It is given by three equations that involve 
unit-level causal effects: 



Rt(u) - Rc(u) - p 


(36) 




(37) 


■sr(u) - Ysr'(u) - 3(r-r"). 


(38) 



In this model, the effects of t and r on Y for a given unit, u, are additive , the 
effect of r on Y enter linearly , and the causal effects of t on R and Y and of 
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r on Y are constant, not depending on the unit, i.e., this is a causal theory 
with constant effects (see the appendix for more on constant effects). 

Equations (36) - (38) involve 3 of the 4 unit-level causal effects in (20) - 
(23). The fourth one, i.e. (21), can be expressed in terms of the other three; 

Ytj^^(u) - - t + p3. (38a) 

In (36), p is the (constant) mamber of hours that encouragement increases 
each student’s amount of study. In (38a), X + p3 is the (constant) improvement in 
test scores due to encouragement to study. In (37), X is the (constant) amount 
that encouragement increases the test scores of a student who always studies r. 

In (38), 3 is the (constant) amount that studying one hour more increases a stu- 
dent’s test scores. 

The ALICE model in (36) - (38) is equivalent to these two functional rela- 
tions of the variables s and r for each fixed unit, u; 



Rg(u) » Rc(u) + P s, 


(39) 


gj-(u) - Yco(u) + t s + 3r. 


(40) 



On the right-hand sides of (39) and (40), s is a 0/1 variable. Y^q(u) is the 
test performance of u if u is not encouraged to study and doesn’t study, and 
R^(u) is the amount u studies if not encouraged to study. The values of Yqq(\i) 
and Rq(u) will vary from student to student and are the vehicle for introducing 
unit heterogeneity into this model -- see the appendix on unit homogeneity. 

For a fixed unit, (39) and (40) are a deterministic linear system involving 
the functions Yg^(u) and Rg(u) of the variables s and r. If we equate r and 

then we have a deterministic linear system, and as in section 3, we may 
associate the path diagram of Figure 4 with it. 
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I have left off the subscripts for R and Y in Figure 4 to emphasize that it does 
not describe empirical relationships in data; i.e., it is not an empirical path 
diagram. Rather, it is a theory about the values of Y(u,s,r) and R(u,s); 
i.e., it is a causal model or a causal theory. 

The ALICE model may appear to be an extremely strong model, yet we shall 
see presently that it is not strong enough to ensure that the regression coef- 
ficients of path analysis have the desired causal interpretations. 

The parameters of the ALICE model (p, 3, and X), may be used to express the 
four ACE*s of the model. These are 



ACEtc(R) - 


P, 


(41) 


ACEtc(Y) - 


X + ep, 


(42) 


ACEtc(Y(-, 


•, r)) - X, 


(43) 


ACErr'(Y(v 


, s, •)) - 3(r - r'). 


(44) 



The ALICE causal model is an example of a ’’constant effect” model (see the 
appendix). Consequently, in the ALICE model, the ACEs are interpretable as 
unit-level causal effects. This is seen by comparing (41) - (44) with (36) - 
(38a). 
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We see that the “total effect” of S on Y in Figure 4 (i.e. X + 3p) is an 
ACE. In addition, for the ALICE model, p, X, and 3 can be interpreted as ACEs. 
What about the FACES? From the results of the previous subsection we know that 
because of randomization, 

FACEtc(R) - ACEtc(R) - P> (45) 

and 

FACEtc(Y) - ACEtc(Y) - X + 3p. (46) 

The other two FACEs are more complicated. They may be shown to be given by the 
following formulas: 

FACEtc(Y(*. •, r)) - X + - p) - Pc(r), (47) 

and 

FACEtc(Y(-, s, •)) - 3(r - r') + PgCr) - Ps(r'), (48) 

where 

Ps(r) - E(Yco|Rs “ for s - t, c. (49) 

Thus, the two remaining FACEs both equal their corresponding ACEs plus 
biases that involve the regression of Y^q on Rg, i.e. Pg(r). 

The value of ^^(r) is the average value of test scores for students when 
they are not encouraged to study and they do not study, for all those students 
who would study an amount r when they are not encouraged to study. Thus, 
is a “counterf actual” regression because Y^q and Rg can never be simultaneously 
observed except when Rg«0. Hence, is inherently unobservable, and assump- 

tions made about it have no empirical consequences that can be directly tested. 
The function is a complicated quantity and one that is not easily thought 

about . Suppose, for simplicity, that it is linear, i.e., that 

Pc(r) - Y + 6r. (50) 
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A positive S means that the more a student would study when not encouraged, the 
higher he or she would score on the test without studying and without encourage- 
ment. A negative S means that the more a student would study when not 
encouraged, the lower he or she would score without studying and without 
encouragement . 

The quantities computed in path analysis are the conditional expectations 

E(Rs|s) - E(Rc) + P S (51) 

and 

E(YsRg|s, Rs) - yc(Rs - pS) + t S + s Rs, (52) 

in which S is a 1/0 indicator variable. If we make the untestable assumption 
that linear, e.g. (50), then (52) becomes 

^(YsRgls, Rs) - Y + (t - 6p) S + (3 + 6) Rs- (53) 

Equations (51) and (53) are both linear and may be combined into the empirical 
path diagram in Figure 5. 



Rs 




Figure 5 

Comparing Figures 5 and 4, we see that even if the ALICE model holds and 
p^(r) is linear, the estimated path coefficients are biased estimates of the 
causal effects t and 3 unless does not depend on r (i.e. 5 « 0). 
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Furthermore, these problems stem from the* inhomogeneity of the units with 
respect to the values of and This inhomogeneity is, I believe, 

the proper way to view the ^disturbance terms” of the structural equations 
model (9) and (10) in section 3, (see section 4.4). One nice thing is that 
while the direct effects are not the same in Figures 5 and 4, the total effects 
are: both equal X + p6. 

4. 3 Two Different Ways to Estimate the Causal Effect of the Encouraged Activity 

The message of the previous subsection is that the effect of study on test 
performance cannot be estimated by the usual regression methods of path analy- 
sis without making untestable assumptions about the counterf actual regression 
function, ^^(r). If we assume that Vlc(r) is constant, then the biases shown in 
Figure 5 vanish and the usual path coefficients may be interpreted as causal 
effects, i.e. as ACEs. However, because difficult to think about, 

there is little reason to believe that it is constant. Nor can S be easily 
assessed as either positive or negative since, in this example, there are 
reasons why it might be either: If students who study a lot tend to be those who 
do well even when they don*t study, then S is p^ositive; but if those who study a 
lot are those who need to study, then S is negative. 

An alternative approach is to suppose that encouragement, of and by itself, 
has no effect on Y. In the studying example, this might be a plausible assump- 
tion. This corresponds to the restriction that 

X * 0. (54) 

Now the empirical path diagram becomes that in Figure 6, 
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and the path diagram for the causal theory becomes that in Figure 7. 




Figure 7 

The total effect of S on ^sRg is now p0, whereas the total effect of S on 
p. Hence, 

total effect of S on Y 

g _ fiB _ . 

p * total effect of S on Rs 

This is also easily seen from the definitions of the ACEs and the FACEs. 
the assumption that I ■ 0, 




Rs is 



(55) 

Under 
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ACEtc(Y) - FACEtc(Y) - 3p, (56) 

(regardless of whether or not linear) and hence 



ACEtc(Y) FACEtc(Y) ^ (57) 

^ “ ACEtc(R) “ FACEtc(R) 

The two FACES in (57) may be estimated simply by the treatment-control mean 
difference in YgRg and Rs as mentioned earlier, so that (57) provides an alter- 
native way to estimate 3 that does not assume that constant. In Powers 

and Swinton (1984), (57) was used to estimate 3. 

4. 4 Deriving a Structural Equations Model 

The ALICE model may be used to derive the structural equations model given 
in (9) and (iO). If we substitute S(u) for s in (39) and S(u) for s and Rs(u) for 
r in (40) we get the following pair of equations that involve the observables, 



s, Rs, YsRg: 

and 



Rs(u) - Rc(u) + p S(u) 



(58) 



^SR 3 (^) ” Yco(u) + t S(u) + 3 Rs(u). (59) 

Now let 



ni(u) - Rc(u) - E(Rc), 



and 



H2(u) - Yco(u) - E(Yco). 

and then define 



a - E(Rc), a' - E(Yco). 

The following equations, which parallel the structural equations model of (9) 
and (10), follow immediately: 

Rs(u) - a + p S(u) + m(u) (60) 

YsRg(u) - a' + t S(u) + 3 Rs(u) + H 2 (u). (61) 
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It is easy to see from the definition of Hi and H2 that by the independence 
assumption (justified by randomization), S is independent of Hi and H2 over U. 

But Rs is not independent of H2 general. In fact, the condition that 
equation (61) be interpretable as a conditional expectation is exactly that Hc(r) 
be constant. It follows from the standard theory of structural equations models 
that ordinary least squares estimates of 3 are biased in general, so that a 
simple regression analysis of (61) would not lead to an estimate of the causal 
effect of studying on test scores. Substituting (60) for R 3 in (61) yields 
YsRg(u) - a' + 3a + (X + p3) s(u) + 3ni(u) + n2<^) 

« a" + (X + p3) s(u) + ri3(^)- 

Equations (60) and (62) constitute the so-called "reduced form" of the 
system (60) and (61). Since S is independent of Hi and X\2> ^ is also indepen- 
dent of H3 * BHi + H2 (62). Thus, (60) and (62) can be interpreted as 
regression functions; therefore, in the language of structural equations models, 
(60) can be used to estimate p and (62) can be used to estimate X + p3. 

Assuming X « 0 now leads to the second estimate of 3 discussed in section 4.3, 

In closing this section, I wish to point out that the ALICE model leads to 
the structural equations system (60) and (61); but the ALICE model could be 
wrong, in various, often testable, ways. Freedman (1987) has argued that 
models like (60) and (61) should be tested before they are used. Rubin *s model 
gives us a framework for doing that testing. But an assumption like X « 0 is 
not testable with the data in hand and must be justified on other grounds. 
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5. DISCUSSION 

One purpose of this paper is to show that path analysis and its general- 
ization, structural equations models, do not justify causal interpretations of 
regression coefficients. Instead, these models simply define certain regression 
coefficients as causal effects by f iat , using the loose causal terminology of 
regression analysis. Rubin's model, on the other hand, precisely defines unit- 
level causal effects, and these, in turn, may be used to deduce causal interpre- 
tations of some regression coefficients under some assumptions. By explicitly 
separating the causal theory, R(u,s) and Y(u,s,r), from the observed data, 

(S(u), R 3 (u), and YsRg(u)), Rubin's model provides analysts with a set of tools 
that engender careful thought about causal theories and their relationship to 
data. 

One example of such "careful thought" is the care that must be exercised in 
identifying variables in complex path models that can truly play the role of 
both effect and cause. Such variables must measure the amount of exposure of 
units to a cause and be themselves influenced by another cause. I used the 
encouragement design to focus myj analysis precisely because it is a clearly 
interpretable example of indirect causation in this sense. Many causal models 
in the literature are not careful about this point; and at best, they merely 
measure the association between variables rather than produce estimates of 
causal effects. 

I am a little reluctant to place as much emphasis as I have on the ALICE 
model of section 4, because I do not wish to appear to endorse it as the only 
way to analyze data from encouragement designs. (For example, it would be 
inappropriate if "studying" were measured simply as a studied/didn't study 
dichotomy.) I simply put the ALICE model forward as a basic case from which 
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deductions are easily made and which has interesting consequences for path ana- 
lysis models. It is a model that captures some of the complexity of encourage- 
ment designs and self-selected treatments. Heterogeneity, nonlinearity and 
non-additivity can be added to the ALICE model in various ways to add complexity 
when that is necessary for proper analyses. 

The two alternative ways to estimate the effect of studying on test scores 
given in section 4.3 were discussed simply to illustrate how the causal model 
must be used to produce estimates of causal effects. In my opinion, in the 
studying example, at least, the untestable assumption that X « 0 is more 
believable (and understandable) then the untestable assumption that leads to the 
usual path analytic estimate of the causal effect, i.e. S » 0. Furthermore, 
though the structural equations model of (60) and (61) can be used to obtain the 
ratio estimate of 3 in section 4.3, there is no way to justify these equations 
except through the ALICE model, or its generalizations. 

The assumption of random assignment of the encouragement condition is an 
important starting place, but there may be applications in which this is 
impossible or implausible, and we need to consider the corresponding obser- 
vational study in which S is not independent of {Rg} and {Yg^}. An interesting 
generalization is to replace randomization by a strong ignorability type of con- 
dition given a covariate, X(u) (see the appendix). For example, suppose that 
there is a covariate X such that given X, S is conditionally independent of (Rg} 
and [Ygj.}. Now, all the equations of section 4 will hold conditionally given X, 
and the calculation of the FACEs is replaced by the corresponding covariate 
adjusted FACEs, i.e., the C-FACEs (see the appendix). How might we wish to 
represent such a system in terms of path diagrams? In Holland (1986b), I 
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suggested that an arrow connect two variables only if one indicated a cause 
(like S or R3) and the other measured a response (like Rs or ^ covariate 

does not have the status of a causal indicator or of a response, so it ought not 
be involved with the arrows, according to such a view. Hence, the four-variable 
system of (X(u), S(u), R3(u), Y3j^g(u)) could be represented as in Figure 8, 



in which X precedes S and R, to indicate that it is not affected by either causal 
variable. However, it might be useful to indicate the conditional independence 
of S and all of the variables {Rg} and given X. This should be done, not 

in an empirical path diagram like Figure 8, but in a path diagram for a causal 
theory, like Figure 4 . Conditional independence also plays an important role in 
structural equations models with latent variables, but that is a subject worthy 
of another paper. The importance of conditional independence suggests the use 
of two types of arrows in path diagrams: e.g. , solid arrows to indicate causal 
relations and something like dashed arrows to indicate conditional independence. 
In the complex causal models of the current literature, such distinctions are 
not made and all arrows indicate causality. This is a mistake in my opinion and 
leads to careless and casual causal talk, I hope that my illustration of how 
Rubin's model can be used to give precision to causal modeling will stimulate 
similar analyses of more complex causal models. 



Rs 
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APPENDIX: A BRIEF REVIEW OF RUBIN’S MODEL FOR EXPERIMENTS AND OBSERVATIONAL 
STUDIES 

Discussions of Rubin's model similar to the one here may be found in 
Holland ( 1986a, b). Uses of the model in a variety of significant applications 
appear in Rubin (1974, 1977, 1978), Holland and Rubin (1983, 1987), Rosenbaum 
and Rubin (1983a, b, 1984a, b, 1985a, b), Rosenbaum (1984a, b,c, 1987) and Holland 
(1988). In the simplest case, the logical elements of Rubin's model form a 
quadruple (U,K,S,Y) where 

U is a population of units, 

K is a set of causes or treatments to which 

each one of the units in U may be exposed. 

S(u) * s if s is the cause in K to which u is actually 
exposed, and 

Y(u,s) * the value of the response that would be observed 
if unit u€U were exposed to cause s€K. 

The meaning of Y(u,s) needs some explanation. The response variable, Y, 
depends both on the unit and on the cause or treatment to which the unit is 
exposed. The idea is that if u were exposed to t€K, then we would observe the 

response value Y(u,t); but if u were exposed to c€K, then we would observe the 

response value Y(u,c). The requirement that Y be a function on pairs (u,s) 
means that. Y(u,s) represents the measurement of some property of u after u is 
exposed to cause s€K. This has the important consequence of forcing the things 

that are called "causes" in K to be potentially exposable to any unit in U. 

This restriction on the notion of cause is of fundamental importance because it 
prevents us from interpreting a variety of associations as causal: e.g.. 
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associations between sex and income or between race and crime. This is 
discussed more extensively in Holland, (1986b, section 7) and in Holland (1988). 

The function Y is called the response function . 

In the references cited at the beginning of this section, a subscript nota- 
tion is used for Y(u,s), i.e., 

Yg(u) « Y(u,s). 

The subscript notation is convenient, and I will use it when appropriate. 

The mapping, S, is the causal indicator or assignment rule because S indicates 
the cause to which each unit is exposed. 

The elements of the quadruple (U,K,S,Y) are the primitives of Rubin *s model, 
and they serve as the undefined terms. All other concepts are defined in terms 
of these primitives. 

The most basic quantity in need of definition is the observed response on 
each unit u£U. This is given by 

Ys(u) - Y(u, S(u)). 

The value Ys(u) is the value of Y that is actually observed for unit u. The 
observed data for unit u, in the simplest case, is the pair 

(S(u), Ys(u)), 

where S(u) is the cause or treatment in K to which u is actually exposed and 
Ys(u) is the observed value of the response, Y. It is important to distinguish 
Ys(u) from Y(u,s): Ys(u) is the response that is actually observed on unit u, 
and Y(u,s) is a potentially observed value that is actually observed only if 
S(u) « s. 

Note that in the subscript notation, the observed response, Yg(u), is 
Ys(u)(^)» so that in the usual probabilistic sense, Yg has a "fixed** subscript 
and Y 5 has a ** random** subscript. 
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In Rubin’s model, causes are taken as undefined elements of the theory, 
and effects are defined in terms of the elements of the model. 

Definition . The unit-level causal effect of cause t€K relative to cause c€K (as 
measured by Y) the difference , 

Y(u,t) - Y(u,c) - Ttc(u). 

Hence, the causal effect , , is the increase in the value of Y(u,t) (which 

is what would be observed if u were exposed to t) over that of Y(u,c) (which is 
what would be observed if u were exposed to c). Glymour (1986) points out that 
in Rubin’s model, effects are defined counterf actually ; i.e., their definitions 
include sentences of the form "if A were the case then B would be the case" 
in which A could be false. It should also be noted that T^cC^) is defined rela - 
tively (i.e., the effect of one cause or treatment is always relative to another 
cause) and is defined at the level of individual units. 

The Fundamental Problem of Causal Inference . The most vexing problem in causal 
inference is that it is impossible to simultaneously observe both Y(u,t) and 
Y(u,c) for two distinct causes t and c; therefore the causal effect, T^c(^)» is 
never directly observable. Rubin’s model makes this explicit by separating the 
observed data (S, Y 5 ) from the function Y. A causal model or a causal theory 
is a specification or partial specification of the values of the function Y. 

Causal inference consists of combining (a) a causal theory , (b) assumpt ions 
about data collection, and (c) the observed data to draw conclusions about 
causal parameters . Many techniques of experimental science are aimed at overcoming 
the Fundamental Problem of Causal Inference by assuming plausible causal 
theories and then combining them appropriately with data. Some examples of such 
causal theories are given in the next several paragraphs. 
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Unit homogeneity . In a scientific laboratory, care is exercised to prepare homo- 
geneous samples of material for study. Such care is often taken to make the 
following partial specification of Y plausible: 

Y(u,s) » Y(v,s) for all u,v6U and all s6K. 

This means that the responses of all units to cause s are the same, i.e., that 
the units respond homogeneously to each cause. In Holland (1986a), I called 
this the assumption of unit homogeneity . It is a partial specification of Y 
because it restricts the values that Y can take on but it does not specify them 
completely. If one assumes unit homogeneity, then the causal effect, Ttc(^)> is 
easily seen to be given by 

Ttc(u) - Yt(u) - Yc(v), 

for any two distinct units u and v in U, In this case, the effect of t 
(relative to c) is constant and does not depend on the unit under consideration 
-- a case I call "constant effect" (see below). Unit homogeneity solves the 
Fundamental Problem of Causal Inference by letting us use the data from two 
units to measure the causal effect on any single unit. 

Fisher*3 null hypothesis . An assumption about Y that has a long history in sta- 
tistics and that is formally similar to unit homogeneity is Fisher's null 
hypothesis : 

Y(u,s) * Y(u,s') for all u€U and all s,s"€k. 

This means that the response of each unit is unaffected by the cause or treat- 
ment to which it is exposed. This is also a partial specification of Y and is 
a causal theory, Fisher's null hypothesis addresses the Fundamental Problem of 
Causal Inference by assuming that once we observe the value of Y for the pair 
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(u,s), we know the value of Y for the pair (u,s^) for any other value of s"€K. 
Under Fisher's null hypothesis, 

Ttc(^) *• 0 for all u€U and all t,c€K. 

So far, I have not given any examples in which assumptions about the data 
collection process matter. At the population level, "data collection" is 
contained in the causal indicator variable, S, since S describes the cause to 
which each unit U is exposed. Suppose we now consider the joint distribution of 
S with {Yg:s€K} as u varies over all of U. By using the terra "joint distribu- 
tion" I do not mean to imply that S or the ^Yg} are stochastic. However, we can 
use the language of probability to describe this joint distribution. For 
example, P(S»s) is the proportion of units for which S(u) « s and E(Ygls=*t) 
is the average value of Yg for all those units for which S(u) » t. This use of 
probability notation allows us to discuss other, more statistical, approaches to 
solving the Fundamental Problem of Causal Inference in a convenient manner. 

The Average Causal Effect . We may define an important causal parameter, the 
average causal effect or the ACE, as the average value of T^c(^) over U, or 

ACEtc - E(Ttc)- 

In this notation, E(T^c) denotes the average value of over all u € U. 

But by definition of this is equivalent to the difference 

ACEtc(Y) - E(Yt) - E(Yc). 

The ACE is a useful summary of the unit-level causal effects, Ttc(^)> when 

varies little as u ranges over U. In some cases, we are interested in 
average behavior over the population of units, and in such a case, the ACE is 
useful regardless of how much Ttc(^) varies with u. 
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When we look at data, we can only observe S(u) and Ys(u) over U; hence, we 
can observe data only from the joint distribution of S and Ys (as opposed to S 
and {YgSsCK}). For example, the average value of the observed response Yg 
among all those units exposed to cause t is 

E(Ysls-t) - E(Yt|s-t), 

and the average value of the observed response among all those units exposed to 
cause c is 

E(Ys|s-c) - E(Ycls-c). 

The difference in average responses between those units exposed to t and those 
units exposed to c is the prima facie average causal effect — the FACE — and 
is given by 

FACEtc(Y) - E(Ysls-t) - E(Ys|s»c) 

- E(Ytls-t) - E(Yc|s=c). 

I use FACE and ACE to draw attention to the fact that we can always compute the 
FACE from data but that it does not necessarily equal the quantity about which 
we wish to make an inference, i.e., the ACE. The difference between the FACE 
and the ACE resides in the difference between 

E(Yt) and E(Yt|s-t) 

and between 

E(Yc) and E(Yc|s-c). 

E(Yt) is the average of Y^. over all of U, whereas E(Yt|s-t) is the average of Y,. 
over only those units that are actually exposed to t. The same is true for 
E(Yc) and E(Ycls-c). 

Independence . It is now time to show the effect of randomization on Rubin*s 
model. Suppose S is independent of {Yg:s6K}. When independence holds we have 
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and 

E(yJs-c) - E(Yc). 

Hence, if S is independent of {Yg:s€K}, the FACE and the ACE are equal, i.e., 

FACEtc(Y) - E(Yt|s-t) - E(Yc|s-c) 

- E(Yt) - E(Yc) 

- ACEtc(Y). 

Thus, independence is important because it relates a causal parameter, i.e. the 
ACE, to an associat ional parameter, i.e. the FACE, that can be computed or esti- 
mated from the observed data, S and Ys- 

Randomization is related to independence in the following way. 

Independence is an assumption about the data collection process, i.e. about the 
relationship between S and Y over the population U. Randomization is a physical 
process that gives plausibility to the independence assumption in some important 
cases. For example, if U were infinite, then the strong law of large numbers 
coupled with randomization implies that almost every realization of S would be 
independent of (Yg}. Randomization does not always make independence plausible; 
the best example of this is the case of a small population of units. If U con- 
tains only two units, then the physical act of randomization does not make the 
independence assumption plausible, even though it may still be useful in forming 
the basis of a test of Fisher’s null hypothesis. 

Constant effect . An important causal theory is the constant effect assumption. 
Constant effect holds when does not depend on u, i.e., when “ "^tc 

for all u. This is equivalent to 
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Yt(u) - Yc(u) + Ttc- 

Thus, constant effect is the same as ”addit ivity” in the ANOVA sense. I prefer 



constant effect since it is more descriptive of the causal theory being 
assumed* 

When constant effect holds, it is easy to see that equals the ACEt^CY); 

i.e. , 

ACEtc(Y) - E(Ttc) - ttc- 

What about the FACE? 

FACEtc(Y) - E(Yt|s-t) - E(Yc|s-c) 

- E(Yc + ttcls-t) - E(Ycls-c) 

- ttc + {E(Ycls-t) - E(Yc|s-c)}. 

Hence, under the constant effect assumption, 

FACEtc(Y) - ACEtc(Y) + BIAS 

where BIAS * E(Yds*t) - E(Y^^ls*c). The terra BIAS involves the "counterf actual 
conditional expectation,'* E(Yds-t), which cannot be computed from data because 
it is the average value of Y^ among all those units that were exposed to t (and 
for which only the value of Y^ is known). Under independence, BIAS « 0, and as 
before, the FACE and the ACE are equal. 

Introducing Other Variables Into Rubin*s Modal . So far, I have discussed the 
simplest form of Rubin's model, in which there is only one variable measured on 
the units — aside from the causal indicator, S. Now suppose there is a second 
variable, X. In Rubin's model, X is introduced as a second real-valued function 
on UxK, X(u,s). The fact that X is real-valued is not important; it could be 
vector-valued. What is important is that we allow for the fact that, in 
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general, X could depend on both u and s. A special class of variables are the 
covariates . 

Definition . X is a covariate if X(u,s) does not depend on s for an^ u€U. 

In Holland (1986a), I used the terra attributes to refer to covariates , but I 
think the latter terra is preferable and because it corresponds to normal experi- 
mental usage. Variables measured on units prior to their exposure to treatments 
are always covariates. Rosenbaum (1984c) discusses post-treatment concomitants 
and their use in statistical adjustments. A post-treatment concomitant is a 
variable measured after the exposure of a unit to the causes in K. For a post- 
treatment concomitant, the possibility that X(u,s) does depend on s cannot be 
ignored and must be decided. If X(u,s) does depend on s then X is not a 
covariate in the sense used here. 

Observational Studies . When the active experimenter is replaced by a passive 
observer who cannot arrange the values of S(u) to achieve independence, we enter 
the realm of observational studies. In such studies we are also interested in 
measuring causal effects; i.e., Rubin's model still applies, but now S is not 
automatically independent of {Yg}. In an observational study, we typically have 
a covariate, X, and we may check the distribution of X in each exposure group by 
comparing the values of 



then, depending on the nature of X and Y, we may not believe that the indepen- 
dence assumption holds in an observational study. However, we might be willing 
to entertain a weaker condit ional independence assumption of the form, given the 



P(X-xls-s) 

across the values of s€K. If there is evidence that P(X»x|s«s) depends on s, 



covariate, X, the variables S, and {Yg:s€K} are conditionally independent. 
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Combined with the assumption that P(S»slx-x) > 0, the conditional independence 
assumption is called strong ignorability by Rosenbaum and Rubin (1983a), 

Strong ignorability is the basis for all covariate-adjusted causal effects 
in observational studies. Covariate adjustments are based on the conditional 
expectations or regression functions E(Y 3 ls»s, X»x) which are used to form the 
covariate-adjusted FACE , i.e., the C-FACE, given by 

C-FACEtc(Y) - E{E(Ysls-t, X) - E(Ysls-c, X)}. 

The C-FACE is like the FACE in that it is generally not equal to the ACE, but 
under conditional independence it is: 

C-FACEtc(Y) - E{E(Yt|s-t, X) - E(Ycls-c, X)} 

- E{E(Ytlx) - E(Yclx)| 

- E(Yt) - E(Yc) 

- ACEtc(Y). 

Rubin *s model was really developed to address the problem of causal 
inference in observational studies, and thorough discussions of its application 
to these types of studies can be found in Rubin (1974, 1977), Holland and Rubin 
(1983), Rosenbaum and Rubin (1983a, b, 1984b, 1985a, b), and Rosenbaum (1984a, b,c; 
1987), 
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